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Information Causality contributes to the program of deriving fundamentals of quantum theory 
from information theoretic principles. It puts restrictions on the amount of information learned by 
a party (Bob) from the other party (Alice) in a one-way communication scenario as follows. Bob 
receives an index b, and after a one-way communication from Alice, tries to recover a part of Alice's 
input. Because of the possibility of cloning, this game in its completely classical form is equivalent 
to one in which there are several Bobs indexed by b, who are interested in recovering different parts 
of Alice's input string, after receiving a public message from her. Adding a private message from 
Alice to each Bob, and assuming that the game is played many times, we obtain the Gray-Wyner 
problem for which a complete characterization of the achievable region is known. In this paper, 
we first argue that in the classical case Information Causality is only a single point in the dual 
of the Gray-Wyner region. Next, we show that despite the fact that cloning is impossible in a 
general physical theory, the result from classical world carries over to any physical theory provided 
that it satisfies a new property. This new property of the physical theory is called 'Accessibility of 
Mutual Information' and holds in the quantum theory. We conclude that the Gray-Wyner region 
completely characterizes all the inequalities corresponding to the game of Information Causality. In 
other words, we provide infinitely many inequalities that Information Causality is only one of them. 

In the second part of the paper we show that Information Causality leads to a non-trivial lower 
bound on the communication cost of simulating a given non-local box when the parties are allowed 
to share entanglement. We also consider the same problem when the parties are provided with 
preshared randomness. 



I. INTRODUCTION 

Non-locality is arguably the most fundamental feature of quantum physics. Bell's theorem [1], as verified by 
experiments [2], states that there are correlations in nature that cannot be explained by local realistic (classical) 
theories. Bell's inequalities restrict the strength of classical correlations, while in the quantum theory correlations are 
characterized by Tsirelson's bounds [3]. The latter bounds, however, heavily rely on the seemingly ad hoc postulates 
of quantum mechanics. On the other hand, non-locality, the property that makes physical theories to depart from the 
classical ones, is a fundamental feature of nature rather than quantum mechanics by itself. Tsirelson's bounds then 
do not provide a satisfactory answer to the problem of quantifying non-locality. 

Recently, there has been a stream of works to understand non-locality from more fundamental principals. No- 
signaling as the hrst such principal does not describe correlations of quantum physics since non-signaling PR-boxes [4] 
maximally violate the Tsirelson bound (and then Bell's inequality) for the CHSH expression [5] and do not seem to be 
physical. Nevertheless, the recently proposed principal of Information Causality [6], a generalization of no-signaling, 
exactly gives Tsirelson's quantum bound for the CHSH. Thus this is a natural question whether Information Causality 
or other information theoretic principals can further our understanding of non-locality. 

A. Information Causality 

Let us briefly explain the game of Information Causality. Alice receives the bit-string a = (ai, . . . , ajv) consisting 
of i.i.d. random bits, and Bob gets an index 1 < b < N. Bob's goal is to output a?, upon receiving a classical message 
x from Alice. Assuming that fa is Bob's guess of when b = i, Information Causality states that 



JV 

H(x)>J2l(af,fa\b = i). 



(1) 
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It turns out one can rewrite this inequality in terms of entropies as follows: 

N 

H{x)+Y J H{a i \p i ,b = i)>H{-$). (2) 

i=l 

Despite its success in characterizing certain regions of non-local correlations, Information Causality seems specifically 
designed for the CHSH. The underlying game of Information Causality is not the general one-way communication 
scenario one could consider. The parallel repetition of the game cannot be expressed as a special instance of the game 
itself, as one would expect from an information theoretic concept (information theoretic concepts are mainly defined 
in an asymptotic sense) . Moreover, the final inequality of Information Causality seems arbitrary and one may ask 
about other combinations of the terms appearing in its expression. Here we argue that the individual terms are indeed 
the right ones, but the combination in Information Causality is only a special one. 

Consider the game of Information Causality in its completely classical form. Classicality enables us to assume 
that there are N Bobs instead of one. We denote these N Bobs by Bobi, . . . , Bob at. The goal of Boh; is to find 
aj. Moreover, we may assume that shared randomness is indeed shared amongst Alice and all Bobs, and all of them 
receive the message x. Then the first term H(x) of (2) is the amount of information that is sent to all Bobs; the 
second term H(ai\fii, b = i) expresses the remaining uncertainty of Bob; about a;. We can interpret this as the average 
number of extra bits that Alice needs to privately send to Bob; to enable the recovery of a; by this party if they were 
to play multiple copies of this game in parallel (the Slepian-Wolf theorem). Since Bobi, . . . , Bob at altogether can 
recover the string "c^, the total flow of information from Alice should be at least H(~ct) by the cut-set bound. That 
is, the sum of the terms on the left hand side of (2) should dominate H(a). This gives a new proof of Information 
Causality in the classical world. 

The above game among Alice and the multiple copies of Bob has a similar setup to the Gray-Wyner problem [7] . 
The Gray-Wyner problem will be rigorously explained later, but roughly speaking it is defined as follows. Alice sends 
a public message x to all Bobs and afterwards a private message to each Bobi. The goal of Bob^ is to recover a; with a 
vanishing probability of error (see Fig. 1). Let Rq denote the information rate of the public message, and (Ri, ■ ■ ■ , Rn) 
denote the rate of the private messages to Bobi, • ■ ■ , Bob at. The Gray-Wyner region explicitly characterizes the set of 
tuples (i?o, R%, ■ ■ ■ , Rn) for which it is possible to satisfy the demands of Bob 1; . . . , Bob^r. This implies that the rates 
i?o = H(x) and Ri = H(a,i\/3i,b — i) have to lie in the Gray-Wyner region when the Information Causality game is 
played in the classical world. 

A main contribution of our work is that despite the fact that the cloning of Bob is impossible in a general physical 
theory, the tuple (H(x), H(a\\[3\, b — 1), . . . , if(ajv|/3jVj & = N)) would still fall in the Gray-Wyner region if the 
physical theory satisfies a new property (besides the ones in [6]). This new property of the physical theory is 
called the Accessibility of Mutual Information and holds in the quantum theory. For any physical theory satisfying 
these properties, there are infinitely many inequalities originated from the characterization of the Gray-Wyner region; 
Information Causality is only one of them. In fact, the Gray-Wyner region completely characterizes all the inequalities 
corresponding to the game of Information Causality in the following sense. On one hand, the tuple (H(x) , H(a\ b = 
1), . . . ,H(aN\0N,b = N)) has to be in the Gray-Wyner region. On the other hand, any point in the Gray-Wyner 
region is achievable, meaning that it can be obtained through a communication scheme in the classical world. 

B. Simulation of non-local correlations 

Quantifying the amount of classical communication required for simulating non-local correlations is the other well- 
studied approach, besides Bell's theorem, in the theory of non-locality (see e.g. [8-15]). A well-known result in this 
direction says that any bipartite correlation coming from one bit of entanglement (an EPR pair) can be realized in 
the classical world by only one bit of communication [8] . 

Here we introduce a novel application of Information Causality in the problem of simulating non-local correlations 
using classical one-way communication. We show that Information Causality leads to a non-trivial lower bound on 
the communication cost of simulating a given non-local box when the parties are allowed to share entanglement. 

We also do have a non-technical contribution if one is interested in the communication cost of simulating a given 
non-local box when the parties only share common randomness. We comment that information theorists who have 
been interested in the area of control have independently studied the same problem in a different context. To the 
best knowledge of the authors, however, all previous results in quantum information attack the problem from a 
communication complexity point of view, and not information theoretic. The communication complexity formulation 
of the problem turns out to be a very difficult one. However, information theoretic formulation of the problem looks 
at the limits of the problem and takes the advantage of laws of large numbers. Connecting these two lines of research, 
we report a formula that gives an exact expression for the optimal amount of communication needed for non-local 
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FIG. 1: The Gray-Wyner game consists of TV + 1 players, Alice and N Bobs who are indexed by 6 = 1, . . . , N. Alice receives 
the i.i.d. copies of (a%, . . . , ffljv) 5 sends public information at rate Ro to all Bobs and private information at rate Ri to Bob 4 . 
The goal of Bobi is to recover a^. 



simulation given preshared randomness. It should be also noted that the information theoretic characterization of the 
communication cost serves as a lower bound on the communication complexity characterization of the bound, because 
the former setup considers asymptotic behaviors and is more relaxed. 



II. REVIEW OF INFORMATION CAUSALITY 

In this paper we mainly adopt the notation used in [6], Alice receives the bit-string ~ct = {a\, . . . , a/v) consisting 
of i.i.d. random bits, and Bob gets an index 1 < 6 < N. Their goal is that Bob after receiving a classical message x 
from Alice, outputs at>. Assuming that /3j is Bob's guess of a 4 when 6 — i, Information Causality states that 

JV 

H(x)> , £ i I(a i ;0i\b = i). (3) 

i=l 

Before getting into our main results and discussion of the Information Causality in the next section, we begin 
by slightly generalizing its game. The game of Information Causality, as formulated in [6], is a special one-way 
communication problem. One can generalize it by assuming that instead of outputting a single bit of Alice, Bob 
may want to compute some function of Alice's input a, and his input b: /(a, b). To fit this new scenario into the 
previous setup, assume that b takes values 1,2, ... ,N, and note that Alice may replace her input a with the string 
(oi, 02, ... , apf) = (/(a, 1), /(a, 2), . . . , /(a, N)) which by slightly abuse of notation we represent by a . Note that a 
is a sufficient statistic from the perspective of Alice, and she can use it instead of a. Bob's goal then is to find af,, the 
6-th coordinate of a as before. The difference, however, is that a,'s are no longer i.i.d. and can be correlated. As a 
summary, the problem of Bob aiming to compute a function of Alice's input can be converted to the problem of aj's 
being correlated. Observe that by considering correlated inputs ai, . . . , a n the parallel repetition of the game is itself. 

Now we should seek for an adjustment of the proof of (3) in [6] that admits correlated etj. Following the proof, 
we find that the independence of <Zj's is used in [6] where the term J(oj-_|_i, CLj+%, ■ ■ ■ , a^;a,-) is dropped. When a^'s 
are correlated, we have to put these penalty terms back. A careful bookkeeping of the penalty terms then gives us 
Sili 1 1( a i+i> a i+2, ■ ■ ■ i ojv; o-i) which is equal to Yli=i H( a i) ~ H(~ct). Adding this to equation (3) we obtain 

N N 

H{x) > I(ai\Pi\b = i)-J2 H ^ + 

i—l i=l 

If we expand the mutual information term I(ai] j3i\b = i) as H(a,i) — H(ai\j3i, b = i), we get a simpler representation 
of the above equation 

H(x)+Y,H(<H\p i ,b = i) >Htf). (4) 

i=l 

In the rest of the paper we still call the above inequality for correlated otj's, Information Causality. 
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Before finishing this section let us review the conditions given in [6] under which (4) holds in a given physical 
theory. Information Causality holds if a symmetric and non-negative mutual information can be defined for every 
two systems in the theory that obeys the following three properties directly quoted from [6]: 

(1) Consistency: If the subsystems A and B are both classical, then I(A;B) should coincide with Shannon's mutual 
information. 

(2) Data Processing Inequality: Acting on one of the parts locally by any state transformation allowed in the 
theory cannot increase the mutual information. I.e., if B — > B' is a permissible map between systems, then 
I(A;B)>I(A;B>). 

(3) Chain Rule: There exists a conditional mutual information I(A;B\C) such that the following identity is satisfied 
for all states and triples of parts: I(A; B, C) = I(A; C) + I(A; B\C). 

Since in this paper we are writing Information Causality in terms of entropy (equation (4)) rather than mutual 
information (equation (3)) we need another property defining the entropy. 

(4) Mutual Information and Entropy: For every system A there exists a non-negative number H(A) that equals the 
Shannon entropy of A if it is classic. Furthermore, for subsystems A and B we have I(A: B) = H(A)+H(B) — H(A, B). 
And by H{A\B) we mean H(A, B) - H{B). 

Remark II. 1. In [16] and [17] equation (4) has been derived from some postulates directly on the entropy rather than 
mutual information. The above four properties thus can be replaced with two. 

We will impose yet another property on mutual information called the 'Accessibility of Mutual Information' (AMI). 

(5) Accessibility of Mutual Information (AMI): Consider arbitrary subsystems A and B where A is classical. Let 
(^4i, (A2, B2), ■ ■ ■ , (A n , B n ) denote n independent copies of {A, B). Then for any e > 0, there exists some n and 
a local state transformation (B\, . . . , B n ) — > e n , such that e„ is classical and 

-I(A 1 ...A n ;e n )>I(A;B)-e. 



Properties (1), (3) and (4) have nothing to do with the space of valid state transformations in the underlying 
physical theory. Property (2), on the other hand, restricts this space, and can always be satisfied by putting enough 
constraints on the set of valid maps between systems. For instance data processing inequality becomes trivial in a 
physical theory whose only valid state transformation is the identity map. Thus to avoid such obscure examples, an 
information theoretic approach to study physical theories has to provide another postulate, besides (l)-(4), about the 
richness of the space of valid state transformations. From this point of view AMI which ensures the existence of certain 
maps, is a natural property. Moreover, this is the property that formulates our intuition of mutual information, and 
otherwise the function of mutual information on non-classical systems has no tangible meaning. 

Observe that AMI holds in quantum physics because the Holevo outer bound on the accessible information is 
asymptotically achievable by the pretty good measurement. 

III. INFORMATION CAUSALITY AND THE GRAY-WYNER PROBLEM 

As discussed in Sec. I A the game of Information Causality is closely related to the Gray-Wyner problem. In 
the latter, there are one encoder (Alice) and N decoders (Bobi, . . . , Bob/v). Alice is observing i.i.d. copies of ~ct = 
(01, . . . , ajv)) where a« £ A% takes discrete values, and can send a public message at rate Rq to all Bobs, and N 
private messages at rates R\,R2, • • ■ ,Rn ( a t rate Ri to Bob^). The goal is for Boh; to recover the i.i.d. copies of 
a, with probability of error converging to zero as the number of i.i.d. observations goes to infinity (see Fig. 1). The 
Gray-Wyner region TZ is defined to be the set of achievable rate vectors (Rq, Ri,..., Rn), he., (Rq, Ri,..., Rn) E TZ- 
if by sending public and private information at rates Rq and R±, . . . , Rn respectively, Bobs' demands can be fulfilled. 
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The set TZ of achievable rate vectors for the Gray-Wyner problem is completely characterized. (Rq, Ri, . . . , Rn) G TZ 
if and only if there exists an auxiliary random variable e such that 

i*o>/(^;e), (5) 

R z > H(ai\e), l<i<N. (6) 



Note that e is classic and is determined by the conditional probability vector p(e\ a ). Moreover, without loss of 
generality we may assume that e takes values in a discrete set of size (Y[ i \ Ai \ + 2) [18]. This enables us to explicitly 
compute TZ. 

Let us now consider the Information Causality game in the classical case and consider the (N + l)-tuple 

(H{x),H(a 1 \/3 1 ,b = l),H(a 2 \0 2 ,b = 2),..., H{a N \p N ,b = N)). (7) 

We claim that this vector is in the set TZ. One can see this by noting that the Gray-Wyner problem is indeed the 
Information Causality game allowed to be played many times. Equivalently, for the choice of e = (x, c) where c is the 
common randomness shared between the two parties one has 

H(x)>I(-J;e), (8) 
H(a i \p i ,b = i)>H(a i \e),l<i<N. (9) 

The first equation comes from the fact that c is independent of ~ct , and the second one is the date processing inequality. 
Now adding up these equations, we obtain 

N N 

H(x) + H(ai\pi,b = i)> e) + ^ H{a t \e) 

i=l i=l 

iV 

= H(~£) - H(lt\e) + H( ai \e) 

i=l 

where in the last line we use the subadditivity of entropy. Thus we obtain the Information Casualty bound in the 
classic world. In general, summation of (8) and (9) with weights Wi > gives 

N N 

w Q H{x) + ^2 WiHidilPi, b = i)> w Q I{~3; e) + ^ WiH( ai \e). (10) 

i=l i=l 

Now if we go beyond the classic world and for instance consider the quantum theory, it is no longer clear that 
the (^V + l)-tuple of equation (7) would always fall in the set TZ. If we attempt to mimic the above proof, shared 
randomness should be replaced with the shared entanglement. If we let B to be Bob's subsystem of the shared 
quantum state, then we need to identify the auxiliary random variable e by e — (x,B). In this case, inequalities (8) 
and (9) do hold because firstly, B is independent of and secondly, the data processing inequality is still available. 
However, our choice of e is not classic anymore while the Gray-Wyner region is defined using classical auxiliary random 
variable e. Furthermore, because cloning is impossible in the quantum theory, we can no longer consider several Bobs 
and the conceptual link with the Gray-Wyner problem is lost. 

One of the main contributions of this paper is that the vector of equation (7) does indeed fall in TZ in the quantum 
world. More generally, we show that the (N + l)-tuple falls in TZ in any physical theory that satisfies properties (1-5) 
mentioned in Sec. II. This is made rigorous and proved in Section IV. But now we discuss some implications of this 
result. 



A. Examples and implications 

In an arbitrary physical theory if the (N + 1)— tuple (7) falls in TZ, equation (10) holds for some p(e\l}). In other 
words, for arbitrary coefficients Wi > we have 

N I N \ 

woH(x)+y2w i H(a i \l3 i ,b = i) >w H(~t)+ inf I -w H(~3\e) + V w,H( ai \e) ) . (11) 
<=i p(e]lt) \ i=i J 



G 



The above equation can be thought of as a generalization of Information Causality (4). This generalization is strictly 
stronger than the original one. For instance, consider the following scenario where Bob receives either an index 
1 < b < N or two indices 1 < b±, 62 < N. In the former case he wants to recover at,, and in the latter both and 
ab 2 . We claim that (4) is loose for this game while we offer much stronger inequalities. To see this note that (4) can 
be written as 

N N 

H(x) + b = i) + J2 H ^ a M>Pj> ( & i A) = > Htf). 

i—l — l 

However, in our generalization we can set the coefficient Wq — 1, and Wi = 1 for i > 1 when the i-th term corresponds 
to the case of Bob receiving a single index, and w$ = otherwise. In this way we get away with the terms of the form 
H(ai,aj\fli, j3j, (61,62) = because their coefficient is zero. We obtain the strictly stronger inequality 

N N 

H(x)+J2H(a i \l3 i ,b = i)> inf I{- a >-e) + Y j H{a i \e) 
i=i p(e|a) i=i 

>H(!t), 

where in the last step we have used the subadditivity of entropy. 

To illustrate the benefit of expressing the game of Information Casualty in terms of the Gray-Wyner region we 
consider another example. Assume that N = 2 and Bob receives an index b g {1,2} and wants to recover ab- We 
assume that random variables a\,a 2 are correlated. By the original Information Causality equation (4), we have 

H{x) + H{ ai \f3 1: b = 1) + H{a 2 \p 2 , b = 2) > H(a 1: a 2 ). 

Now we claim that if H(x) is sufficiently small, the above inequality is loose. This is clear when H(x) = be- 
cause ai,a 2 are correlated. In general the inequality is loose whenever H(x) is less than J(ai,a 2 ) the Wyner's 
common information between a\ and a 2 [28]. To prove the above claim we exploit a known result about 
the two receiver Gray-Wyner problem (see pp. 367-368 of [19]) saying that the minimum value of R when 
i?o + Ri + R2 = H(ai,a 2 ) is the Wyner's common information. The claim follows from this result and the fact 
that the triple (i?o = H(x) 1 R\ = H(ai\/3i,b — l),R 2 — H(a 2 \f3 2 ,b — 2)) is in the Gray-Wyner region. Thus, when 
i?o = H(x) < J(ai,a 2 ), we should look into the full characterization of the Gray-Wyner region, which can be 
expressed in terms of all inequalities of the form (11). 

When Alice's inputs ai,a 2l ...,ajv are mutually independent, the Gray-Wyner region is fully characterized by the 
single inequality with all weights if, = 1, thus our main contribution is in the case where a,-'s are not mutually 
independent. 

IV. PROOF OF THE MAIN RESULT 

In this section we prove the following theorem. 

Theorem IV. 1. Fix a strategy for the Information Causality game in a physical theory that satisfies properties (1-5) 
of Sec. II including Accessibility of Mutual Information. Then the (N + I) -tuple 

(H(x),H( ai \f3x,b = l),H(a 2 \p 2 ,b = 2), . . . , H(a N \(3 N , b = N)) 

falls in the Gray-Wyner region 1Z corresponding to p(a±, a 2l . . . , ajv) given by equations (5) and (6). 

Remark IV. 2. As in [6], we would like to emphasize that all of the quantities showing up in the statement of the 
theorem "do not involve the details of a particular physical model but are fully determined by Alice 's and Bob 's input 
bits and Bob's output." 

Proof. Let u = (x, B). We claim that 

H(x)>ltf;u), (12) 
H{a i \p i ,b = i)>H(a i \u), 1 < i < N. (13) 

The first inequality was proved in [6] and holds because B is independent of a . The second one comes from the data 
processing inequality and the fact that b is independent of ( a , x, B). 
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For a physical system a (random variable in the classical case) we denote a n to be n independent copies of a. Fix 
an arbitrary e > 0. The rest of the proof can be divided into two parts. 

(I) There exists a natural number n and a random variable e n determined by the conditional distribution p(e n \~ct n ) 
such that 

/(^;u)>i/(^";e n ), (14) 
H( ai \u) > ±H(a?\e n ) - e, 1 < i < N. (15) 



(II) There exists a random variable e* (again determined by p(e*\~ct)) such that 

i/(^«;e„)=/(^;e*), 
iff«|e n ) > H(oi\e*), l<i<N. 



(I) and (II) together with (12) and (13) imply that 

H(x)>I(J;e*), 
H(a i \/3 i ,b = i)>H(a i \e*)-e, 1 < i < N. 

In other words, (H(x), H(a 1 \f3 1 ,b = 1) + e, H(a 2 \02, b = 2) + e, . . . , H(a N \/3 N , b = iV) + e) belongs to ft. Since e > 
is arbitrary and the set 1Z is closed [29], we obtain the desired result. 

We prove (I) here and leave the proof of (II), which follows from standard tricks in information theory, for Ap- 
pendix A. 

AMI for the pair (~ct,u) implies that there exist a natural number n and a local processing u n — > e n where e„ is 
classical such that 

-I(t n :e n )>I(lt;u)-e. (16) 
n 

Equation (14) then follows from the data processing inequality: I(e n ;~ct n ) < I(u n ;~ct n ) — iil{u\~ct). 
Proof of (15) is easier if we rewrite it as 

-7(a";e„) > I{di]u) - e. 
n 

By the data processing inequality [30] we have I(~^ n ; u n \af) > I(~ct n ; e n \af). Then using the fact that ~ct n contains 
af we obtain 

Itf; u) - I(a t ;u) > i (I(^ n ; e n ) - I(af; e„)) , 

or equivalently 

J(l^» - 1 I(Tt n ;e n ) > I( ai ;u) - -IK;e n ). 
n n 

But by (16), the left hand side is at most e, and we are done. 



V. SIMULATION OF NON-LOCAL CORRELATIONS 



Finding the amount of classical communication required to simulate non-local correlations is a well-known method 
to quantify non- locality (see e.g. [8-15]). In this section we argue that Information Causality could enhance our 
understanding of this problem. Furthermore, because the problem of simulating non-locality is important on its own, 
we also aim to encourage study of quantification of non-locality from an information theoretic perspective, rather 
than a communication complexity one. 
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Given a certain non-local box, we consider the problem of the minimum amount of communication from Alice to 
Bob required to simulate the box. It is possible to consider multi-round communication scenarios as well, but unless 
stated otherwise we consider only one-way communication schemes. Here we assume that Alice and Bob have infinite 
shared randomness, and similarly the entanglement-assisted version of this problem can be considered. We prove that 
Information Causality leads to a lower bound on the communication cost of simulating a given non-local box when 
the two parties are provided with preshared entanglement. 

Take an arbitrary non-local box, and consider the Information Causality game when Alice and Bob are provided 
with copies of this box as a resource. (This is indeed the scenario considered in [6].) Moreover, take a certain scheme 
where a message x is transmitted and a random variable is constructed by Bob. Assume that the two parties use 
k copies of the box in this protocol. We would like to simulate this scheme by two new parties, say Alice' and Bob', 
who have only access to quantum entanglement as their resources at the outset. 

Let Cbox be the entanglement-assisted communication cost of simulating the non-local box. Alice' and Bob' can 
simulate the scheme of Alice and Bob by first sending kCbox bits from Alice' to Bob' to simulate the k boxes, and then 
H{x) bits to simulate the message that was passed from Alice to Bob. This enables Bob' to faithfully simulate Pi. 
Now it is legitimate to write the Information Causality principle for the simulated protocol because it is happening 
in the quantum world. The total size of transmitted message is fcCbox + H(x). Therefore, 

N 

kC box + H{x)>^I(a i ;/3 i \b = i). 
i=i 

As an example let us consider the imperfect PR-box with bias e. That is for binary inputs a, b and outputs x, y, 
x © y = ab with probability 1±^. The scheme provided in [6] for N = 2" uses k = 2 n — 1 of these boxes, and the 
right hand side of the above equation is computed to be 2" (l — h(^—)). This gives us the following equation which 
results in a lower bound on Cb ox - 

(2" - l)C box + 1 > 2"(1 - h( 1 -^)). 

Computing this lower bound for all n and taking the optimal one for every e, we obtain the plot of Fig. 2. We see 
that the lower bound is equal to one at e = 1, thus it has to be tight at this point. By [6], the above lower bound 
(for n converging to infinity) would also be tight at the other end point e < ^= ~ 0.707. However, it may be loose in 

between because firstly we have considered the specific scheme of [6] for using boxes, and secondly this lower bound 
holds more generally for any physical theory satisfying properties of mutual information given in [6] and not only for 
quantum physics. Nonetheless, we would like to highlight that the lower bound at e = 1 is tight in any such physical 
theory, as shown in the figure. 

Although this method for finding a lower bound on the entanglement-assisted communication cost of simulating 
non-local boxes works for any box, we only have the specific example of imperfect PR-boxes. This is because to the 
best knowledge of authors PR-boxes (and their generalizations [20]) are the only example of non-local boxes for which 
a relatively efficient scheme for solving communication problems is known. 

A. Non-local box simulation from an information theoretic perspective 

In the previous section we considered the problem of simulation of non-local boxes when the two parties share entan- 
glement. In this section, we are interested in the same problem when the parties share classical common randomness. 
Our purpose here is to draw connections between the problem at hand, and a control problem studied by information 
theorists. We will report a formula that gives an exact expression for the optimal amount of communication needed 
for non-local simulation given preshared randomness. It should be also noted that the information theoretic charac- 
terization of the communication cost serves as a lower bound on the communication complexity characterization of 
the bound, because the former setup considers asymptotic behaviors and is more relaxed. 

Information theorists have looked at the problem of simulating non-local correlations in a different context without 
linking it to quantum physics. Indeed this problem is related to the problem of coordinating distributed controllers to 
carry out some joint action (see [21, 22]). Now, if one were to formulate the problem of simulating non-local correlations 
in an information theoretic framework, it would go along the following lines: we are interested in simulating many 
independent copies of a non-local box, and ask for the minimum communication needed per box. Interestingly this 
formulation coincides with the formulation of the control problem. 

More precisely, take an arbitrary bipartite box p(x,y\a,b). Random variables x and y are not meaningful unless 
we specify a joint distribution on the pair (a, b). So let us fix a distribution p(a 1 b) on a, b as well. Now consider the 
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FIG. 2: A lower bound on the entanglement-assisted one-way communication cost of simulating imperfect PR-boxes with 
parameter e, where p(x (By = ab) = Observe that for the Tsirelson bound e = ~ 0.707, Cbox = 0. This lower bound is 
an implication of Information Causality. 

following problem. Assume that Alice and Bob are observing i.i.d. copies of a and b respectively. Alice is interested 
in creating i.i.d. copies of x whereas Bob is interested in creating i.i.d. copies of y within a vanishing total variation 
distance from the distribution p(x, y, a, b). To accomplish this, the two nodes exchange messages. The goal is to find 
the minimum communication rates, i.e. number of bits exchanged per i.i.d. observation of a, b. A formal definition of 
the simulation problem can be found in [23] . 

Information theoretic treatment of the above problem assumes that common randomness is provided to the par- 
ties at a given rate. However, in our setting we assume that the two parties are provided with infinite common 
randomness. It is shown in [24] that the minimum one-way communication rate from Alice to Bob with infinite 
preshared randomness when both b and x are constant random variables, is equal to I(a;y). The exact formula 
for the communication rate has been obtained by Yassaee [25]. The answer is the maximum of I(a;u\b) over all 
classical random variables u determined by p(u\a,b,x,y) such that the joint distribution p(u,a,b,x,y) factorizes as 
p{u,a,b,x,y) = p(a, b)p[u\a)p[x\u, a)p(y\u, b). Its proof is beyond the scope of this paper [31]. 

For imperfect PR-boxes defined above, with uniform distribution on inputs (p(a, b) = p(a)p(b) = 4), independence 
of a and b implies that I(a;u\b) — I(a;u). Moreover, u can be taken to be a binary random variable using the 
Fenchel extension of the Caratheodory theorem. Then computing the optimal rate for every e is a straightforward 
optimization problem. Fig. 3 gives the one-way communication cost of winning the CHSH game with probability p. 

VI. CONCLUSION 

We generalized and improved our understanding of Information Causality by connecting it to the Gray-Wyner 
problem. We showed that, assuming a new property on the underlying physical theory (Accessibility of Mutual In- 
formation), the classical Gray-Wyner region completely characterizes the game of Information Causality. That is, we 
provide an infinite number of inequalities one of which is Information Causality. Our assumption of Accessibility of 
Mutual Information is obvious in the classical case and holds in the quantum theory. Since the underlying assumptions 
of Information Causality have been recently studied for a general physical theory [26, 27], it is a natural question 
whether AMI can be verified for other physical theories. It is also interesting to see whether other Tsirelson's in- 
equalities (besides the CHSH example) can be derived from these new inequalities. Moreover two-way communication 
protocols and multiparty scenarios are natural extensions of the Information Causality game and can be studied in a 
general physical theory. 

We also studied the problem of simulating non-local correlations. We showed that Information Causality gives a 
bound on the rate of required communication assuming preshared entanglement. Moreover, we reported a formula to 
compute the optimal rate of required communication assuming infinite preshared randomness. 
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FIG. 3: The one-way communication cost of winning the CHSH game with probability p = 1^ assuming preshared randomness. 
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Appendix A: Part II of the proof of Theorem IV. 1 

Following the proof of Theorem IV. 1 we need to show that there exists some random variable e* such that 

i/(>;e„)=/(^;e*), 
iff«|e n ) > H(oi\e*), l<i<N. 

This in fact says that the n-letter version of the Gray-Wyner capacity region is equal to its single-letter, which holds 
because it is a general property of the capacity regions in information theory. Nonetheless, to be self-contained, we 
provide a proof. 

As before a n denotes n independent copies of a. For 1 < i < j < n we let a l:J to be the j — i + 1 copies of a 
starting with the z-th one. For simplicity we denote a" by aS l \ Moreover, for notational convenience we introduce 
a* = (a*, a* 2 , . . . , a* N ) having the same joint distribution as ~ct = (oi, <X2i ■ ■ • , Qjv) and find p{e*\~ct*) such that 

ij(^;e n ) = I(^*;e*), 
iff(a?|en) > H(a*\e*), 1 < i < N. 

Let q be a random variable uniform on {1, 2, . . . , n} independent of all previous random variables. Define 

e* = (e^"^- 1 ,?), 



where by our convention of — a\' q is the g-th i.i.d. copy of a.;. Observe that (a*,a^,...,a%) has the same joint 
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distribution as (oi, a 2 , . . . , ajv)- Then using the chain rule we obtain 

JC^*;e*) = /(^);e n ,^ l! «- 1 ) g) 

= itf™; q) + Itf^^-^q) + I(^;e n \q, 

n 

= + 0+-V/(^>;e n |^- 1 ) 
n 

9=1 

= -7(^";e n ). 
n 

Furthermore, by the date processing inequality we have 

n n 

H(a*\e*) = H(a^\e n ,^-\q) = - £ ^(a^| e?l , < - ]T H(a| fl) |e n , a^ 1 ) = -ff(a?|e„). 

qr=l g=l 
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